41 research outputs found
Online Adaptation of Convolutional Neural Networks for Video Object Segmentation
We tackle the task of semi-supervised video object segmentation, i.e.
segmenting the pixels belonging to an object in the video using the ground
truth pixel mask for the first frame. We build on the recently introduced
one-shot video object segmentation (OSVOS) approach which uses a pretrained
network and fine-tunes it on the first frame. While achieving impressive
performance, at test time OSVOS uses the fine-tuned network in unchanged form
and is not able to adapt to large changes in object appearance. To overcome
this limitation, we propose Online Adaptive Video Object Segmentation (OnAVOS)
which updates the network online using training examples selected based on the
confidence of the network and the spatial configuration. Additionally, we add a
pretraining step based on objectness, which is learned on PASCAL. Our
experiments show that both extensions are highly effective and improve the
state of the art on DAVIS to an intersection-over-union score of 85.7%.Comment: Accepted at BMVC 2017. This version contains minor changes for the
camera ready versio
Track, then Decide: Category-Agnostic Vision-based Multi-Object Tracking
The most common paradigm for vision-based multi-object tracking is
tracking-by-detection, due to the availability of reliable detectors for
several important object categories such as cars and pedestrians. However,
future mobile systems will need a capability to cope with rich human-made
environments, in which obtaining detectors for every possible object category
would be infeasible. In this paper, we propose a model-free multi-object
tracking approach that uses a category-agnostic image segmentation method to
track objects. We present an efficient segmentation mask-based tracker which
associates pixel-precise masks reported by the segmentation. Our approach can
utilize semantic information whenever it is available for classifying objects
at the track level, while retaining the capability to track generic unknown
objects in the absence of such information. We demonstrate experimentally that
our approach achieves performance comparable to state-of-the-art
tracking-by-detection methods for popular object categories such as cars and
pedestrians. Additionally, we show that the proposed method can discover and
robustly track a large variety of other objects.Comment: ICRA'18 submissio
Optimal approximation of -functions using shallow complex-valued neural networks
We prove a quantitative result for the approximation of functions of
regularity (in the sense of real variables) defined on the complex cube
using shallow
complex-valued neural networks. Precisely, we consider neural networks with a
single hidden layer and neurons, i.e., networks of the form and show that one
can approximate every function in
using a function of that form with error of the order as , provided that the activation function is smooth but not polyharmonic on some non-empty open set.
Furthermore, we show that the selection of the weights and is continuous with respect to and
prove that the derived rate of approximation is optimal under this continuity
assumption. We also discuss the optimality of the result for a possibly
discontinuous choice of the weights
Upper and lower bounds for the Lipschitz constant of random neural networks
Empirical studies have widely demonstrated that neural networks are highly
sensitive to small, adversarial perturbations of the input. The worst-case
robustness against these so-called adversarial examples can be quantified by
the Lipschitz constant of the neural network. In this paper, we study upper and
lower bounds for the Lipschitz constant of random ReLU neural networks.
Specifically, we assume that the weights and biases follow a generalization of
the He initialization, where general symmetric distributions for the biases are
permitted. For shallow neural networks, we characterize the Lipschitz constant
up to an absolute numerical constant. For deep networks with fixed depth and
sufficiently large width, our established upper bound is larger than the lower
bound by a factor that is logarithmic in the width
BURST: A Benchmark for Unifying Object Recognition, Segmentation and Tracking in Video
Multiple existing benchmarks involve tracking and segmenting objects in video
e.g., Video Object Segmentation (VOS) and Multi-Object Tracking and
Segmentation (MOTS), but there is little interaction between them due to the
use of disparate benchmark datasets and metrics (e.g. J&F, mAP, sMOTSA). As a
result, published works usually target a particular benchmark, and are not
easily comparable to each another. We believe that the development of
generalized methods that can tackle multiple tasks requires greater cohesion
among these research sub-communities. In this paper, we aim to facilitate this
by proposing BURST, a dataset which contains thousands of diverse videos with
high-quality object masks, and an associated benchmark with six tasks involving
object tracking and segmentation in video. All tasks are evaluated using the
same data and comparable metrics, which enables researchers to consider them in
unison, and hence, more effectively pool knowledge from different methods
across different tasks. Additionally, we demonstrate several baselines for all
tasks and show that approaches for one task can be applied to another with a
quantifiable and explainable performance difference. Dataset annotations and
evaluation code is available at: https://github.com/Ali2500/BURST-benchmark